Clustering Change Detection Using Normalized Maximum Likelihood Coding
نویسندگان
چکیده
We are concerned with the issue of detecting changes of clustering structures from multivariate time series. From the viewpoint of the minimum description length (MDL) principle, we introduce an algorithm that tracks changes of clustering structures so that the sum of the code-length for data and that for clustering changes is minimum. Here we employ a Gaussian mixture model (GMM) as representation of clustering, and compute the code-length for data sequences using the normalized maximum likelihood (NML) coding. The introduced algorithm enables us to deal with clustering dynamics including merging, splitting, emergence, disappearance of clusters from a unifying view of the MDL principle. We empirically demonstrate using artificial data sets that our proposed method is able to detect cluster changes significantly more accurately than an existing statistical-test based method and AIC/BIC-based methods. We further use real customers’ transaction data sets to demonstrate the validity of our algorithm in market analysis.
منابع مشابه
NORCAMA: Change Analysis in SAR Time Series by Likelihood Ratio Change Matrix Clustering
This paper presents a likelihood ratio test based method of change detection and classification for synthetic aperture radar (SAR) time series, namely NORmalized Cut on chAnge criterion MAtrix (NORCAMA). This method involves three steps: 1) multitemporal pre-denoising step over the whole image series to reduce the effect of the speckle noise; 2) likelihood ratio test based change criteria betwe...
متن کاملSpeaker diarization using normalized cross likelihood ratio
In this paper, we present the Normalized Cross Likelihood Ratio (NCLR) and the advantages of using it in a speaker diarization system. First, the NCLR is used as a dissimilarity measure between two Gaussian speaker models in the speaker change detection step and its contribution to the performance of speaker change detection is compared with those of BIC and Hostelling’s T-Statistic measures. T...
متن کاملNormalized Maximum Likelihood Coding for Exponential Family with Its Applications to Optimal Clustering
We are concerned with the issue of how to calculate the normalized maximum likelihood (NML) code-length. There is a problem that the normalization term of the NML code-length may diverge when it is continuous and unbounded and a straightforward computation of it is highly expensive when the data domain is finite . In previous works it has been investigated how to calculate the NML code-length f...
متن کاملBearing Fault Detection Based on Maximum Likelihood Estimation and Optimized ANN Using the Bees Algorithm
Rotating machinery is the most common machinery in industry. The root of the faults in rotating machinery is often faulty rolling element bearings. This paper presents a technique using optimized artificial neural network by the Bees Algorithm for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (maximum likelihood estima...
متن کاملSignal processing approaches as novel tools for the clustering of N-acetyl-β-D-glucosaminidases
Nowadays, the clustering of proteins and enzymes in particular, are one of the most popular topics in bioinformatics. Increasing number of chitinase genes from different organisms and their sequences have beenidentified. So far, various mathematical algorithms for the clustering of chitinase genes have been used butmost of them seem to be confusing and sometimes insufficient. In the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012